智能论文笔记

Learn to See Faster: Pushing the Limits of High-Speed Camera with Deep Underexposed Image Denoising

Weihao Zhuang , Tristan Hascoet , Ryoichi Takashima , Tetsuya Takiguchi

分类：计算机视觉

2022-11-29

The ability to record high-fidelity videos at high acquisition rates is central to the study of fast moving phenomena. The difficulty of imaging fast moving scenes lies in a trade-off between motion blur and underexposure noise: On the one hand, recordings with long exposure times suffer from motion blur effects caused by movements in the recorded scene. On the other hand, the amount of light reaching camera photosensors decreases with exposure times so that short-exposure recordings suffer from underexposure noise. In this paper, we propose to address this trade-off by treating the problem of high-speed imaging as an underexposed image denoising problem. We combine recent advances on underexposed image denoising using deep learning and adapt these methods to the specificity of the high-speed imaging problem. Leveraging large external datasets with a sensor-specific noise model, our method is able to speedup the acquisition rate of a High-Speed Camera over one order of magnitude while maintaining similar image quality.

translated by 谷歌翻译

Optical Flow Regularization of Implicit Neural Representations for Video Frame Interpolation

Weihao Zhuang , Tristan Hascoet , Ryoichi Takashima , Tetsuya Takiguchi

分类：计算机视觉 | 机器学习

2022-06-22

最近的作品表明，隐式神经表示（INR）具有信号导数的有意义表示的能力。在这项工作中，我们利用该属性来执行视频框架插值（VFI），通过明确限制INR的衍生物以满足光流约束方程。我们仅使用目标视频及其光流，在有限的运动范围内实现了最先进的VFI，而无需从其他培训数据中学习插值操作员。我们进一步表明，限制INR衍生物不仅可以更好地插值中间框架，还可以提高狭窄网络适合观察到的帧的能力，这暗示了潜在的视频压缩和INR优化的应用。

translated by 谷歌翻译

ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders

Sanghyun Woo , Shoubhik Debnath , Ronghang Hu , Xinlei Chen , Zhuang Liu , In So Kweon , Saining Xie

分类：计算机视觉

2023-01-02

Driven by improved architectures and better representation learning frameworks, the field of visual recognition has enjoyed rapid modernization and performance boost in the early 2020s. For example, modern ConvNets, represented by ConvNeXt, have demonstrated strong performance in various scenarios. While these models were originally designed for supervised learning with ImageNet labels, they can also potentially benefit from self-supervised learning techniques such as masked autoencoders (MAE). However, we found that simply combining these two approaches leads to subpar performance. In this paper, we propose a fully convolutional masked autoencoder framework and a new Global Response Normalization (GRN) layer that can be added to the ConvNeXt architecture to enhance inter-channel feature competition. This co-design of self-supervised learning techniques and architectural improvement results in a new model family called ConvNeXt V2, which significantly improves the performance of pure ConvNets on various recognition benchmarks, including ImageNet classification, COCO detection, and ADE20K segmentation. We also provide pre-trained ConvNeXt V2 models of various sizes, ranging from an efficient 3.7M-parameter Atto model with 76.7% top-1 accuracy on ImageNet, to a 650M Huge model that achieves a state-of-the-art 88.9% accuracy using only public training data.

translated by 谷歌翻译

Holistic Network Virtualization and Pervasive Network Intelligence for 6G

Xuemin , Shen , Jie Gao , Wen Wu , Mushu Li , Conghao Zhou , Weihua Zhuang

分类：人工智能

2023-01-02

In this tutorial paper, we look into the evolution and prospect of network architecture and propose a novel conceptual architecture for the 6th generation (6G) networks. The proposed architecture has two key elements, i.e., holistic network virtualization and pervasive artificial intelligence (AI). The holistic network virtualization consists of network slicing and digital twin, from the aspects of service provision and service demand, respectively, to incorporate service-centric and user-centric networking. The pervasive network intelligence integrates AI into future networks from the perspectives of networking for AI and AI for networking, respectively. Building on holistic network virtualization and pervasive network intelligence, the proposed architecture can facilitate three types of interplay, i.e., the interplay between digital twin and network slicing paradigms, between model-driven and data-driven methods for network management, and between virtualization and AI, to maximize the flexibility, scalability, adaptivity, and intelligence for 6G networks. We also identify challenges and open issues related to the proposed architecture. By providing our vision, we aim to inspire further discussions and developments on the potential architecture of 6G.

translated by 谷歌翻译

Second Thoughts are Best: Learning to Re-Align With Human Values from Text Edits

Ruibo Liu , Chenyan Jia , Ge Zhang , Ziyu Zhuang , Tony X Liu , Soroush Vosoughi

分类：自然语言处理 | 人工智能

2023-01-01

We present Second Thought, a new learning paradigm that enables language models (LMs) to re-align with human values. By modeling the chain-of-edits between value-unaligned and value-aligned text, with LM fine-tuning and additional refinement through reinforcement learning, Second Thought not only achieves superior performance in three value alignment benchmark datasets but also shows strong human-value transfer learning ability in few-shot scenarios. The generated editing steps also offer better interpretability and ease for interactive error correction. Extensive human evaluations further confirm its effectiveness.

translated by 谷歌翻译

Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models

Jiale Xu , Xintao Wang , Weihao Cheng , Yan-Pei Cao , Ying Shan , Xiaohu Qie , Shenghua Gao

分类：计算机视觉

2022-12-28

Recent CLIP-guided 3D optimization methods, e.g., DreamFields and PureCLIPNeRF achieve great success in zero-shot text-guided 3D synthesis. However, due to the scratch training and random initialization without any prior knowledge, these methods usually fail to generate accurate and faithful 3D structures that conform to the corresponding text. In this paper, we make the first attempt to introduce the explicit 3D shape prior to CLIP-guided 3D optimization methods. Specifically, we first generate a high-quality 3D shape from input texts in the text-to-shape stage as the 3D shape prior. We then utilize it as the initialization of a neural radiance field and then optimize it with the full prompt. For the text-to-shape generation, we present a simple yet effective approach that directly bridges the text and image modalities with a powerful text-to-image diffusion model. To narrow the style domain gap between images synthesized by the text-to-image model and shape renderings used to train the image-to-shape generator, we further propose to jointly optimize a learnable text prompt and fine-tune the text-to-image diffusion model for rendering-style image generation. Our method, namely, Dream3D, is capable of generating imaginative 3D content with better visual quality and shape accuracy than state-of-the-art methods.

translated by 谷歌翻译

Towards Disentangling Relevance and Bias in Unbiased Learning to Rank

Yunan Zhang , Le Yan , Zhen Qin , Honglei Zhuang , Jiaming Shen , Xuanhui Wang , Michael Bendersky , Marc Najork

分类：人工智能

2022-12-28

Unbiased learning to rank (ULTR) studies the problem of mitigating various biases from implicit user feedback data such as clicks, and has been receiving considerable attention recently. A popular ULTR approach for real-world applications uses a two-tower architecture, where click modeling is factorized into a relevance tower with regular input features, and a bias tower with bias-relevant inputs such as the position of a document. A successful factorization will allow the relevance tower to be exempt from biases. In this work, we identify a critical issue that existing ULTR methods ignored - the bias tower can be confounded with the relevance tower via the underlying true relevance. In particular, the positions were determined by the logging policy, i.e., the previous production model, which would possess relevance information. We give both theoretical analysis and empirical results to show the negative effects on relevance tower due to such a correlation. We then propose three methods to mitigate the negative confounding effects by better disentangling relevance and bias. Empirical results on both controlled public datasets and a large-scale industry dataset show the effectiveness of the proposed approaches.

translated by 谷歌翻译

Learning List-Level Domain-Invariant Representations for Ranking

Ruicheng Xian , Honglei Zhuang , Zhen Qin , Hamed Zamani , Jing Lu , Ji Ma , Kai Hui , Han Zhao , Xuanhui Wang , Michael Bendersky

分类：人工智能 | 自然语言处理 | 机器学习

2022-12-21

Domain adaptation aims to transfer the knowledge acquired by models trained on (data-rich) source domains to (low-resource) target domains, for which a popular method is invariant representation learning. While they have been studied extensively for classification and regression problems, how they apply to ranking problems, where the data and metrics have a list structure, is not well understood. Theoretically, we establish a domain adaptation generalization bound for ranking under listwise metrics such as MRR and NDCG. The bound suggests an adaptation method via learning list-level domain-invariant feature representations, whose benefits are empirically demonstrated by unsupervised domain adaptation experiments on real-world ranking tasks, including passage reranking. A key message is that for domain adaptation, the representations should be analyzed at the same level at which the metric is computed, as we show that learning invariant representations at the list level is most effective for adaptation on ranking problems.

translated by 谷歌翻译

PAL: Persona-Augmented Emotional Support Conversation Generation

Jiale Cheng , Sahand Sabour , Hao Sun , Zhuang Chen , Minlie Huang

分类：自然语言处理

2022-12-19

Due to the lack of human resources for mental health support, there is an increasing demand for employing conversational agents for support. Recent work has demonstrated the effectiveness of dialogue models in providing emotional support. As previous studies have demonstrated that seekers' persona is an important factor for effective support, we investigate whether there are benefits to modeling such information in dialogue models for support. In this paper, our empirical analysis verifies that persona has an important impact on emotional support. Therefore, we propose a framework for dynamically inferring and modeling seekers' persona. We first train a model for inferring the seeker's persona from the conversation history. Accordingly, we propose PAL, a model that leverages persona information and, in conjunction with our strategy-based controllable generation method, provides personalized emotional support. Automatic and manual evaluations demonstrate that our proposed model, PAL, achieves state-of-the-art results, outperforming the baselines on the studied benchmark. Our code and data are publicly available at https://github.com/chengjl19/PAL.

translated by 谷歌翻译

Let's Negotiate! A Survey of Negotiation Dialogue Systems

Haolan Zhan , Yufei Wang , Tao Feng , Yuncheng Hua , Suraj Sharma , Zhuang Li , Lizhen Qu , Gholamreza Haffari

分类：自然语言处理

2022-12-18

Negotiation is one of the crucial abilities in human communication, and there has been a resurgent research interest in negotiation dialogue systems recently, which goal is to empower intelligent agents with such ability that can efficiently help humans resolve conflicts or reach beneficial agreements. Although there have been many explorations in negotiation dialogue systems, a systematic review of this task has to date remained notably absent. To this end, we aim to fill this gap by reviewing contemporary studies in the emerging field of negotiation dialogue systems, covering benchmarks, evaluations, and methodologies. Furthermore, we also discuss potential future directions, including multi-modal, multi-party, and cross-cultural negotiation scenarios. Our goal is to provide the community with a systematic overview of negotiation dialogue systems and to inspire future research.

translated by 谷歌翻译